Skip to content

[feature](file cache)Import file cache for remote file reader#15622

Merged
morningman merged 14 commits into
apache:masterfrom
BePPPower:fileCache
Jan 10, 2023
Merged

[feature](file cache)Import file cache for remote file reader#15622
morningman merged 14 commits into
apache:masterfrom
BePPPower:fileCache

Conversation

@BePPPower

@BePPPower BePPPower commented Jan 4, 2023

Copy link
Copy Markdown
Contributor

Proposed changes

Issue Number: close #15456

The main purpose of this pr is to import fileCache for lakehouse reading remote files.
Use the local disk as the cache for reading remote file, so the next time this file is read, the data can be obtained directly from the local disk.
In addition, this pr includes a few other minor changes

Problem summary

Import File Cache:

  1. The imported fileCache is called block_file_cache, which uses lru replacement policy.
  2. Implement a new FileRereader——CachedRemoteFilereader, so that the logic of file cache is hidden under CachedRemoteFilereader.

Other changes:

  1. Add a new interface fs() for FileReader.
  2. IOContext adds some statistical information to count the situation of FileCache

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@BePPPower BePPPower changed the title [feature] Import file cache for remote file reader [feature](file cache)Import file cache for remote file reader Jan 4, 2023

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/io/cloud/cached_remote_file_reader.h Outdated
Comment thread be/src/io/cloud/cached_remote_file_writer.cpp Outdated
Comment thread be/src/io/cloud/cached_remote_file_writer.cpp Outdated
Comment thread be/src/io/cloud/cached_remote_file_writer.cpp Outdated
Comment thread be/src/io/cloud/cached_remote_file_writer.h Outdated
Comment thread be/src/io/cloud/cached_remote_file_writer.h Outdated
Comment thread be/src/io/cloud/cached_remote_file_writer.h Outdated
Comment thread be/src/io/cloud/cached_remote_file_writer.h Outdated
Comment thread be/src/util/async_io.h Outdated
PriorityThreadPool* _io_thread_pool = nullptr;
PriorityThreadPool* _remote_thread_pool = nullptr;

private:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: redundant access specifier has the same accessibility as the previous access specifier [readability-redundant-access-specifiers]

Suggested change
private:

be/src/util/async_io.h:81: previously declared here

private:
^

Comment thread be/src/util/lock.h Outdated
Comment thread be/src/vec/exec/format/json/new_json_reader.h Outdated
Comment thread be/src/io/fs/file_reader_options.cpp Outdated
@github-actions

github-actions Bot commented Jan 6, 2023

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

3 similar comments
@github-actions

github-actions Bot commented Jan 7, 2023

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions

github-actions Bot commented Jan 7, 2023

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions

github-actions Bot commented Jan 9, 2023

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@BePPPower BePPPower marked this pull request as ready for review January 9, 2023 09:14
@github-actions

github-actions Bot commented Jan 9, 2023

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions

github-actions Bot commented Jan 9, 2023

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions

github-actions Bot commented Jan 9, 2023

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen

hello-stephen commented Jan 9, 2023

Copy link
Copy Markdown
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 35.28 seconds
load time: 484 seconds
storage size: 17122523706 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230109152720_clickbench_pr_76300.html

@github-actions

github-actions Bot commented Jan 9, 2023

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@morningman morningman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jan 10, 2023
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/vectorization reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement](file reader) Refactor file reader

4 participants